Loading Libraries
library(tidyverse)
library(reshape2)
library(purrr)
library(viridis)
library(ggthemes)
library(assertthat)
library(plotly)
library(ggthemes)
set.seed(921021)
Transform the \(x\) by using 9 Gaussian basis functions. Fix the parameters of your gaussian functions. Take note of the domain of the function when you select the parameter governing the spatial scale (i.e. \(s\)). Share your code and graph the results. You need to use your result in the succeeding questions. (1 point)
In this part I create functions, which are used throughout the script. They will be handy, as we are creating several different sizes of the dataset.
This function creates data of the specified length. Returns two vectors in a dataframe - x, and target.
data_gen <- function(n){
x <- runif(n, min = -1, max = 1)
noise <- rnorm(n, mean = 0, sd = 0.1)
target <- sin(2 * pi * x) + noise
return(data.frame(x, target))
}
This is the gaussian basis function. Transforms the vector using gaussian function $_j(x) = exp() $.
gauss_basis <- function(x, mu, s_sq){
return(
exp(
(-(x - mu) ^ 2) / (2 * s_sq)
)
)
}
This function takes dataset created from data_gen, and transforms it using 9 Gaussian basis functions. returns n times 11 matrix, where 1st columns is x, 2nd is target and next 9 are transformed gaussian bases. I can also modify \(s^2\), however it is preset to 0.2. I will shortly present why.
gauss_basis_gen <- function(data_frame, s_sq = 0.02) {
# x <- data_frame$x
result <- data_frame
counter <- 1
for (i in seq(-1, 1, 0.25)) {
result <- result %>%
mutate(gauss_basis(x, mu = i, s_sq = s_sq))
colnames(result)[dim(result)[2]] <- paste('gauss', counter, sep = "_")
counter <- counter + 1
}
return(result)
}
this function quickly creates test data. It will be handy throughout the analysis. variables are stored in global env. And because it is otherwise annoying in the editor, I also create them once here. But they will also be useful for creating graph.
values_gb <- data_gen(100) %>%
gauss_basis_gen(s_sq = 0.02)
phi_mat <- values_gb %>%
select(starts_with("gauss")) %>%
as.matrix()
t_mat <- as.matrix(values_gb$target)
create_test_data <- function(n = 100, s_sq = 0.1) {
values_gb <<- data_gen(n) %>%
gauss_basis_gen(s_sq)
phi_mat <<- values_gb %>%
select(starts_with("gauss")) %>%
as.matrix()
t_mat <<- as.matrix(values_gb$target)
}
my_theme <- theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1),
panel.grid.major = element_line(colour = "gray94"),
panel.grid.minor = element_line(linetype = "blank"),
panel.background = element_rect(fill = NA))
Now, we generate data using presented functions and plot them.
create_test_data(s_sq = 0.02)
values_gb %>%
melt(id.vars = c("x", "target"), variable.name = "gauss_fun") %>%
ggplot(aes(x = x, y = value)) +
geom_line(aes(colour = gauss_fun), size = 1) +
geom_point(aes(y = target, colour = "target"), alpha = 0.1) +
ggtitle("Gaussian bases and target data for s_sq = 0.02") + my_theme
Selecting \(s^2\) is tricky. It seems as a good option to fit \(s^2 = 0.02\), as then four of the gaussian bases would fit nicely at the sinusoid.
We can also have a look at \(s^2 = 0.1\):
create_test_data(s_sq = 0.1)
values_gb %>%
melt(id.vars = c("x", "target"), variable.name = "gauss_fun") %>%
ggplot(aes(x = x, y = value)) +
geom_line(aes(colour = gauss_fun), size = 1) +
geom_point(aes(y = target, colour = "target"), alpha = 0.1) +
theme_pander() +
scale_colour_pander() + ggtitle("Gaussian bases and target data for s_sq = 0.1")
I will also plot gaussian bases for \(s^2 = 0.2\). These values of \(s^2\) will be also discussed later throughout the assignment.
create_test_data(s_sq = 0.2)
values_gb %>%
melt(id.vars = c("x", "target"), variable.name = "gauss_fun") %>%
ggplot(aes(x = x, y = value)) +
geom_line(aes(colour = gauss_fun), size = 1) +
geom_point(aes(y = target, colour = "target"), alpha = 0.1) +
theme_pander() +
scale_colour_pander() + ggtitle("Gaussian bases and target data for s_sq = 0.2")
Estimate the parameters using regularized least squares for sample sizes: 20,40,…,100. provide an overview table with point estimates for the parameters. (2 points)
In this section we first create function, which directly calculates RLS. This is done analytically. Function also checks, whether input is matrix.
RLS <- function(phi_mat, target, lambda) {
assert_that(is.matrix(phi_mat))
assert_that(is.matrix(target))
w_RLS <- (solve(lambda * diag(dim(phi_mat)[2]) + t(phi_mat) %*% phi_mat) %*% t(phi_mat)) %*% target
return(w_RLS)
}
Now we run for loop, each for different amount of data. There is trick used that I multiply i by twenty, and therefore create different amount of data using data_gen function. I also decided to use \(\lambda = 1\). This is chosen arbitrarily, but I could use cross validation to evaluate the optimal \(\lambda\), among other methods. However, this is not in the scope of this assignment.
w_RLS_result <- matrix(ncol = 5, nrow = 9)
colnames(w_RLS_result) <- c(paste("n", (1:5) * 20, sep = "_"))
for (i in 1:5) {
create_test_data(n = (20 * i), s_sq = 0.1)
w_RLS_result[, i] <- RLS(phi_mat = phi_mat,
target = t_mat,
lambda = 0.2)
}
w_RLS_result
## n_20 n_40 n_60 n_80 n_100
## [1,] 0.64391924 -0.3481212 -0.38341701 -0.86949189 -0.48725426
## [2,] 0.94493238 1.3599656 1.46062805 1.66079584 1.58523100
## [3,] -0.08555535 0.1012209 0.22010891 0.46016282 0.42791562
## [4,] -1.27029381 -1.6878083 -1.85824852 -2.17039266 -2.24630945
## [5,] 0.02273491 0.1172315 0.03677534 -0.05233911 -0.02027794
## [6,] 1.35552388 1.6632600 1.81361979 2.25156961 2.38473462
## [7,] -0.11449517 -0.2901697 -0.27136157 -0.41696594 -0.52801508
## [8,] -1.07262107 -1.2275655 -1.35921956 -1.74553312 -1.87405717
## [9,] 0.23969066 0.3468297 0.39043409 0.76886376 1.06757729
Estimate the parameters using bayesian linear regression assuming that the parameters have multivariate normal prior: 20, 40,…,100. Here, you can preselect the \(\beta\) and \(\alpha\). Provide an overview table with point estimates for the parameters. (2 points)
In this part I use two different methods. I use Bayesian Linear Regression with and without updating. I chose \(s^2\) equal to 0.1, and decided to use \(\alpha = 1\) and \(\beta = 25\). These alpha represents my prior beliefs about precision, beta represents the “weight” for prior and posterior distributions.
BLR <- function(phi_mat, target, alpha, beta) {
assert_that(is.matrix(phi_mat))
assert_that(is.matrix(target))
m_0 <- rep(0, 9)
S_0 <- alpha * diag(1, 9)
S_N <- solve(solve(S_0) + beta * t(phi_mat) %*% phi_mat)
m_N <- S_N %*% (solve(S_0) %*% m_0 + beta * t(phi_mat) %*% target)
return(m_N)
}
BLR_updating <- function(phi_mat, target, alpha, beta) {
assert_that(is.matrix(phi_mat))
assert_that(is.matrix(target))
#setting prior
m_0 <- as.matrix(rep(0, 9))
S_0 <- alpha * diag(1, 9)
# for proper working, m_N needs to be defined beforehand. Let's set it the same.
m_N <- m_0
S_N <- S_0
for (i in 1:dim(phi_mat)[1]) {
m_0 <- m_N # old posterior is new prior.
S_0 <- S_N
phi <- t(as.matrix(phi_mat[i, ]))
S_N <- solve(solve(S_0) + (beta * t(phi) %*% phi))
m_N <- S_N %*% (solve(S_0) %*% m_0 + (beta * t(phi) %*% target[i]))
}
return(m_N)
}
BLR_iter <- function(phi_mat, target, alpha, beta){
assert_that(is.matrix(phi_mat))
assert_that(is.matrix(target))
S_N <- solve(alpha * diag(1, 9) + beta * t(phi_mat) %*% phi_mat)
m_N <- beta * S_N %*% t(phi_mat) %*% target
return(list(m_N, S_N))
}
We can have a look at the results.
## n_20 n_40 n_60 n_80 n_100
## [1,] -1.2795469 -0.06681946 -0.3645774 -1.57445847 -1.5798353
## [2,] 2.0848378 1.26166309 1.4623522 2.41455520 2.5666831
## [3,] 0.3858385 0.48584247 0.5212679 0.50846908 0.3975910
## [4,] -2.2287955 -2.11879521 -2.2631436 -2.69394978 -2.7119086
## [5,] 0.1933852 -0.15541296 -0.3213381 0.03789891 0.1252091
## [6,] 1.6834754 2.37681537 2.7100522 2.64199078 2.5739217
## [7,] -0.3400583 -0.49021670 -0.4025712 -0.53685046 -0.5297915
## [8,] -0.9882390 -1.85381311 -2.4685037 -2.36607109 -2.2071415
## [9,] 0.2786406 1.05420655 1.6414096 1.52114239 1.3035935
w_BLRupd_result
## n_20 n_40 n_60 n_80 n_100
## [1,] -1.2795469 -0.06681946 -0.3645774 -1.57445847 -1.5798353
## [2,] 2.0848378 1.26166309 1.4623522 2.41455520 2.5666831
## [3,] 0.3858385 0.48584247 0.5212679 0.50846908 0.3975910
## [4,] -2.2287955 -2.11879521 -2.2631436 -2.69394978 -2.7119086
## [5,] 0.1933852 -0.15541296 -0.3213381 0.03789891 0.1252091
## [6,] 1.6834754 2.37681537 2.7100522 2.64199078 2.5739217
## [7,] -0.3400583 -0.49021670 -0.4025712 -0.53685046 -0.5297915
## [8,] -0.9882390 -1.85381311 -2.4685037 -2.36607109 -2.2071415
## [9,] 0.2786406 1.05420655 1.6414096 1.52114239 1.3035935
w_BLRiter_result
## n_20 n_40 n_60 n_80 n_100
## [1,] -1.2795469 -0.06681946 -0.3645774 -1.57445847 -1.5798353
## [2,] 2.0848378 1.26166309 1.4623522 2.41455520 2.5666831
## [3,] 0.3858385 0.48584247 0.5212679 0.50846908 0.3975910
## [4,] -2.2287955 -2.11879521 -2.2631436 -2.69394978 -2.7119086
## [5,] 0.1933852 -0.15541296 -0.3213381 0.03789891 0.1252091
## [6,] 1.6834754 2.37681537 2.7100522 2.64199078 2.5739217
## [7,] -0.3400583 -0.49021670 -0.4025712 -0.53685046 -0.5297915
## [8,] -0.9882390 -1.85381311 -2.4685037 -2.36607109 -2.2071415
## [9,] 0.2786406 1.05420655 1.6414096 1.52114239 1.3035935
We can observe that all of these methods provide us with identical weights. For the consistency of the code I will use the BLR_iter function later on, and because of the way it is written it also provides me with the variance of the weights.
Discuss how the parameter estimates for the two methods compare, you can use tables and graphs to support your claims. (2 points)
I will use graphs to show how these methods compare.
In the following graph, we can control on how many points were the models trained and then we look at how the model fits the data. I set the values to the same values chosen in previous parts. I will also select \(s^2 = 0.1\).
Regarding the models, we can observe that in this setup BLR is performing better on small samples and quickly converges to true distribution . However \(\lambda\) in RLS is restricting overfitting of the data. In this case it is too restrictive and therefore BLR is closer to the original distribution. As we can observe later on, hyperparameters of BLR are set to almost ideal values.
p1 <- plot_interactive_graph_BLS_RLS(s_squared = 0.1, lambda_RLS = 1, alpha_BLR = 0.15, beta_BLR = 100, use_BLR_iter = TRUE)
p1[[1]]
This can be also observed using RMSE. Restricted RLS has higher RLS. But logically RMSE is decreasing with increasing number of observations.
p1[[2]]
## # A tibble: 5 x 3
## n rmse_BLR rmse_RLS
## <dbl> <dbl> <dbl>
## 1 20 0.07468591 0.3848420
## 2 40 0.09344046 0.3517010
## 3 60 0.09524920 0.3335168
## 4 80 0.10307398 0.3183566
## 5 100 0.09645740 0.2966234
In the following graph I set \(\lambda\) in RLS model really low and I chose \(\alpha = 1\) and \(\beta = 2\) We can osberve that in this setting RLS are outperforming Bayesian model.
p2 <- plot_interactive_graph_BLS_RLS(s_squared = 0.1, lambda_RLS = 0.001, alpha_BLR = 2, beta_BLR = 2, use_BLR_iter = TRUE)
p2[[1]]
This can be again supported by calculating RMSE.
p2[[2]]
## # A tibble: 5 x 3
## n rmse_BLR rmse_RLS
## <dbl> <dbl> <dbl>
## 1 20 0.4253003 0.07736551
## 2 40 0.3777786 0.09177116
## 3 60 0.3490395 0.08777033
## 4 80 0.3152287 0.10100379
## 5 100 0.2865531 0.09299874
In the next graph we compare the better variants of the previous cases. We can observe they are behaving almost identically.
p3 <- plot_interactive_graph_BLS_RLS(s_squared = 0.1, lambda_RLS = 0.001, alpha_BLR = 0.15, beta_BLR = 100, use_BLR_iter = TRUE)
p3[[1]]
In this case, RLS are just a bit better, what can be seen in following table:
p3[[2]]
## # A tibble: 5 x 3
## n rmse_BLR rmse_RLS
## <dbl> <dbl> <dbl>
## 1 20 0.07936882 0.07911477
## 2 40 0.10350998 0.10311557
## 3 60 0.09963815 0.09922493
## 4 80 0.09455386 0.09428599
## 5 100 0.09170342 0.09146828
In this section I use simplified version of BLR based on equations 3.53 and 3.54 of Bishop (2009). And using 3.90, 3.91, 3.92 and 3.95 I estimate hyperparameters \(\alpha\) and \(\beta\), which are used afterwards for estimating parameters of Gauss basis functions.
Results, using \(s^2 = 0.1\): We can also see how many iterations it took until optimal values for alpha and beta were calculated, and I also report optimal values for alpha and beta. We have already seen how function with similar parameters. However, this function calculates optimal alpha and beta for any size of the sample.
## [1] "n 20"
## [1] "iteration 10"
## [1] "alpha 0.186014572953755"
## [1] "beta 152.167680453792"
## [1] "n 40"
## [1] "iteration 16"
## [1] "alpha 0.13214811637829"
## [1] "beta 60.6645517374535"
## [1] "n 60"
## [1] "iteration 10"
## [1] "alpha 0.129381718148116"
## [1] "beta 110.102554623994"
## [1] "n 80"
## [1] "iteration 7"
## [1] "alpha 0.16402139833713"
## [1] "beta 102.504993162466"
## [1] "n 100"
## [1] "iteration 9"
## [1] "alpha 0.122179824931754"
## [1] "beta 88.2264493791938"
w_BLRopt_result
## n_20 n_40 n_60 n_80 n_100
## [1,] -1.9017164 -2.5778221 -2.4514392 -2.25670096 -2.8200250
## [2,] 2.8191962 4.1233676 4.1759201 3.55894245 4.4262458
## [3,] 0.1455936 -0.9115899 -1.5652849 -0.43858546 -1.0531262
## [4,] -2.3944644 -2.1701564 -1.2079632 -2.14485911 -2.2883269
## [5,] -0.1581510 0.2318685 -0.3159097 -0.05338502 0.8034739
## [6,] 2.7294393 1.8026887 1.6359751 2.14691627 0.9338535
## [7,] -0.1265294 0.8915569 1.5618207 0.63003962 1.7338664
## [8,] -3.1438710 -3.9665716 -4.5888076 -3.82248749 -4.5119276
## [9,] 2.1069921 2.4303305 2.7390786 2.46794244 2.6896922
It will be also interesting to see what is the performance of this model which uses optimal BLR.
We can easily compare the methods. In this case I intentionally set parameters of previous method so they are not producing optimal values. We can observe that this shows the strength of this iterative method while looking for optimal alpha and beta. BLR with calculated optimal values for alpha and beta quickly converges to original distribution and performs better even in larger samples.
p4 <- plot_interactive_graph_BLRopt(s_squared = 0.1, lambda_RLS = 1, alpha_BLR = 1, beta_BLR = 2)
p4[[1]]
We can observe that optimal BLR outperforms both previously presented methods in this case.
p4[[2]]
## # A tibble: 5 x 4
## n rmse_BLR rmse_RLS rmse_BLRopt
## <dbl> <dbl> <dbl> <dbl>
## 1 20 0.3376302 0.3995054 0.06504645
## 2 40 0.2587032 0.3041012 0.08213722
## 3 60 0.2705842 0.3345429 0.07488630
## 4 80 0.2540323 0.3211110 0.08526266
## 5 100 0.2313424 0.2930980 0.09524344
To evaluate the performance of the models, I would use another (test) sample which I would try to predict using other observations, used for model training. Afterwards, I would use performance evaluation metrics such as already presented RMSE, or MAE. If the problem would be of classification nature, It would be handy to calculate accuracy - how many times I have correctly guessed the right class. This actually is not the current case.
In the following graph we can observe how presented models behave in prediction of hundred randomly chosen points. We can control on how many points were the models trained.
plot_interactive_graph(0.1, lambda_RLS = 0.01, alpha_BLR = 0.5, beta_BLR = 20)